Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: aibek Date: Sep 21, 2013 2:31am
Forum: forums Subject: mangled UTF-8 text in forums

I have noticed today that multi-byte-utf8 characters in the forum pages are getting mangled. Please check the linked pages.

It is clear that at some stage, in the last week or so, a program assumed that the UTF-8 encoded text is CP1252 encoded, and “converted” all the existing forum posts to UTF-8. (utf-8 encoded text on this page show fine, so the conversion happened sometime in the last week or so.)

Random pages containing mangled characters:
Text Archive forum: http://archive.org/post/938068/copyright-language-suggested-for-texts-faq
The Forum forum: http://archive.org/post/938603/read-online-version-pages-blank
Notice the out-of-the-blue ‘, ’, etc.

On the following page, if you set the 'Input encoding' to CP1252/CR-LF, 'Output encoding' to UTF-8, and text to be recoded to ' ‘ ' (LEFT SINGLE QUOTATION MARK, utf-8: 0xE2 0x80 0x98), then you will get ' ‘ '.
http://kanjidict.stc.cx/recode.php

This post was modified by aibek on 2013-09-21 09:31:44

Reply to this post
Reply [edit]

Poster: anand-archive Date: Sep 22, 2013 9:05pm
Forum: forums Subject: Re: mangled UTF-8 text in forums

Thanks for the detailed report. The issue is fixed now.

Reply to this post
Reply [edit]

Poster: tracey pooh Date: Sep 21, 2013 10:02am
Forum: forums Subject: Re: mangled UTF-8 text in forums

thanks for the detailed info and report, very appreciated.

I've relayed it to the relevant engineer -- yes we did convert our entire DB about 2 weeks ago, and have been doing some adjustments to the char encoding of some tables at times since then -- your deduction on conversion and timing is probably very close to spot on 8-)

Reply to this post
Reply [edit]

Poster: aibek Date: Sep 21, 2013 6:20pm
Forum: forums Subject: Re: mangled UTF-8 text in forums

Thanks.